What is Time Series Data and its Types?¶

Before jumping right into Time Series Ananlysis , lets first understand what is Time Series Data.

  • Time series data is a collection of observations obtained through repeated measurements over time. Plot the points on a graph, and one of your axes would always be time.
  • What sets time series data apart from other data is that the analysis can show how variables change over time.
  • The frequency of recorded data points may be hourly, daily, weekly, monthly, quarterly or annually.
  • In other words, time is a crucial variable because it shows how the data adjusts over the course of the data points as well as the final results. It provides an additional source of information and a set order of dependencies between the data.

  • Time Series Data have 2 types:

1 Measurements gathered at regular time intervals (metrics) 2 Measurements gathered at irregular time intervals (events)

image.png

This research follow 1 stock, if we follow 2 and over stocks, we talk about type of data - Panel data (longitudinal data)¶

What is Time Series Analysis ?¶

Now that we have understood what Time Series data means .. lets understand what is Time Series analysis?

  • Time-series analysis is a method of analyzing data to extract useful statistical information and characteristics.
  • A time series analysis encompasses statistical methods for analyzing time series data. These methods enable us to extract meaningful statistics, patterns and other characteristics of the data.
  • Time series are visualized with the help of line charts. So, time series analysis involves understanding inherent aspects of the time series data so that we can create meaningful and accurate forecasts.
  • One of the study's main goals is to predict future value.

Why organizations use time series data analysis?¶

  • Time series analysis helps organizations understand the underlying causes of trends or systemic patterns over time.
  • Using data visualizations, business users can see seasonal trends and dig deeper into why these trends occur.
  • When organizations analyze data over consistent intervals, they can also use time series forecasting to predict the likelihood of future events.

Time Series Analysis Types¶

Some of the models of time series analysis include -

  • Classification: It identifies and assigns categories to the data.

  • Curve Fitting: It plots data on a curve to investigate the relationships between variables in the data.

  • Descriptive Analysis: Patterns in time-series data, such as trends, cycles, and seasonal variation, are identified.

  • Explanative analysis: It attempts to comprehend the data and the relationships between it and cause and effect.

  • Segmentation: It splits the data into segments to reveal the source data's underlying properties.

Components of a Time-Series¶

  • Trend - The trend shows a general direction of the time series data over a long period of time. A trend can be increasing(upward), decreasing(downward), or horizontal(stationary).
  • Seasonality - The seasonality component exhibits a trend that repeats with respect to timing, direction, and magnitude. Some examples include an increase in water consumption in summer due to hot weather conditions.
  • Noise - Outliers or missing values
  • Cyclical Component - These are the trends with no set repetition over a particular period of time. A cycle refers to the period of ups and downs, booms and slums of a time series, mostly observed in business cycles. These cycles do not exhibit a seasonal variation but generally occur over a time period of 3 to 12 years depending on the nature of the time series.
  • Irregular Variation - These are the fluctuations in the time series data which become evident when trend and cyclical variations are removed. These variations are unpredictable, erratic, and may or may not be random.
  • ETS Decomposition - ETS Decomposition is used to separate different components of a time series. The term ETS stands for Error, Trend and Seasonality.

image.png

1 install library¶

In [1]:
import pandas as pd
from datetime import timedelta
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error
from matplotlib import pyplot as plt
from typing import List
In [2]:
import math
In [3]:
import time
In [4]:
import plotly.express as px
In [5]:
import seaborn as sns
In [6]:
# stats tools
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
In [7]:
from tensorflow.keras import Sequential
In [8]:
pip install yfinance
Requirement already satisfied: yfinance in d:\anaconda\lib\site-packages (0.2.10)
Requirement already satisfied: pandas>=1.3.0 in d:\anaconda\lib\site-packages (from yfinance) (1.5.2)
Requirement already satisfied: requests>=2.26 in d:\anaconda\lib\site-packages (from yfinance) (2.28.1)
Requirement already satisfied: lxml>=4.9.1 in d:\anaconda\lib\site-packages (from yfinance) (4.9.1)
Requirement already satisfied: multitasking>=0.0.7 in d:\anaconda\lib\site-packages (from yfinance) (0.0.11)
Requirement already satisfied: cryptography>=3.3.2 in d:\anaconda\lib\site-packages (from yfinance) (38.0.4)
Requirement already satisfied: html5lib>=1.1 in d:\anaconda\lib\site-packages (from yfinance) (1.1)
Requirement already satisfied: frozendict>=2.3.4 in d:\anaconda\lib\site-packages (from yfinance) (2.3.4)
Requirement already satisfied: pytz>=2022.5 in d:\anaconda\lib\site-packages (from yfinance) (2022.7.1)
Requirement already satisfied: numpy>=1.16.5 in d:\anaconda\lib\site-packages (from yfinance) (1.23.5)
Requirement already satisfied: appdirs>=1.4.4 in d:\anaconda\lib\site-packages (from yfinance) (1.4.4)
Requirement already satisfied: beautifulsoup4>=4.11.1 in d:\anaconda\lib\site-packages (from yfinance) (4.11.1)
Requirement already satisfied: soupsieve>1.2 in d:\anaconda\lib\site-packages (from beautifulsoup4>=4.11.1->yfinance) (2.3.2.post1)
Requirement already satisfied: cffi>=1.12 in d:\anaconda\lib\site-packages (from cryptography>=3.3.2->yfinance) (1.15.1)
Requirement already satisfied: webencodings in d:\anaconda\lib\site-packages (from html5lib>=1.1->yfinance) (0.5.1)
Requirement already satisfied: six>=1.9 in d:\anaconda\lib\site-packages (from html5lib>=1.1->yfinance) (1.16.0)
Requirement already satisfied: python-dateutil>=2.8.1 in d:\anaconda\lib\site-packages (from pandas>=1.3.0->yfinance) (2.8.2)
Requirement already satisfied: certifi>=2017.4.17 in d:\anaconda\lib\site-packages (from requests>=2.26->yfinance) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in d:\anaconda\lib\site-packages (from requests>=2.26->yfinance) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in d:\anaconda\lib\site-packages (from requests>=2.26->yfinance) (1.26.14)
Requirement already satisfied: charset-normalizer<3,>=2 in d:\anaconda\lib\site-packages (from requests>=2.26->yfinance) (2.0.4)
Requirement already satisfied: pycparser in d:\anaconda\lib\site-packages (from cffi>=1.12->cryptography>=3.3.2->yfinance) (2.21)
Note: you may need to restart the kernel to use updated packages.
In [9]:
import yfinance as yf #https://pypi.org/project/yfinance/ - finance API

2 install data¶

In [10]:
data = yf.download("AAPL", start="2000-01-01", end="2023-02-10")
[*********************100%***********************]  1 of 1 completed

3 intelligence analysis of data¶

In [11]:
data.head()
Out[11]:
Open High Low Close Adj Close Volume
Date
2000-01-03 0.936384 1.004464 0.907924 0.999442 0.850643 535796800
2000-01-04 0.966518 0.987723 0.903460 0.915179 0.778926 512377600
2000-01-05 0.926339 0.987165 0.919643 0.928571 0.790324 778321600
2000-01-06 0.947545 0.955357 0.848214 0.848214 0.721931 767972800
2000-01-07 0.861607 0.901786 0.852679 0.888393 0.756128 460734400

Before getting started with the intelligence analysis of data part lets understand the meaning of these feature terms:¶

  • Open - Open means the price at which a stock started trading when the opening bell rang.
  • Close - Close refers to the price of an individual stock when the stock exchange closed shop for the day. It represents the last buy-sell order executed between two traders
  • High - The high is the highest price at which a stock is traded during a period.
  • Low - The low is the lowest price of the period.
  • Adj Close - Adjusted values factor in corporate actions such as dividends, stock splits, and new share issuance
  • Volume - Volume is the total number of shares traded in a security period. ### Why is a Stock’s Closing Price Significant?

  • Stock’s closing price determines how a share performs during the day.

  • When researching historical stock price data, financial institutions, regulators, and individual investors use the closing price as the standard measure of the stock’s value as of a specific date. For example, a stock’s close on December 31, 2019, was the closing price for that day and that week, month, quarter, and year.
  • The difference between the stocks open and close divided by the open is the stock’s return or performance in percentage terms.
In [12]:
data.shape
Out[12]:
(5814, 6)
In [13]:
data.isna().sum() #check None meaning
Out[13]:
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64
In [14]:
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5814 entries, 2000-01-03 to 2023-02-09
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       5814 non-null   float64
 1   High       5814 non-null   float64
 2   Low        5814 non-null   float64
 3   Close      5814 non-null   float64
 4   Adj Close  5814 non-null   float64
 5   Volume     5814 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 318.0 KB
In [15]:
data = data.reset_index() #add index
In [16]:
data
Out[16]:
Date Open High Low Close Adj Close Volume
0 2000-01-03 0.936384 1.004464 0.907924 0.999442 0.850643 535796800
1 2000-01-04 0.966518 0.987723 0.903460 0.915179 0.778926 512377600
2 2000-01-05 0.926339 0.987165 0.919643 0.928571 0.790324 778321600
3 2000-01-06 0.947545 0.955357 0.848214 0.848214 0.721931 767972800
4 2000-01-07 0.861607 0.901786 0.852679 0.888393 0.756128 460734400
... ... ... ... ... ... ... ...
5809 2023-02-03 148.029999 157.380005 147.830002 154.500000 154.264465 154279900
5810 2023-02-06 152.570007 153.100006 150.779999 151.729996 151.498688 69858300
5811 2023-02-07 150.639999 155.229996 150.639999 154.649994 154.414230 83322600
5812 2023-02-08 153.880005 154.580002 151.169998 151.919998 151.688400 64120100
5813 2023-02-09 153.779999 154.330002 150.419998 150.869995 150.639999 56007100

5814 rows × 7 columns

In [17]:
data = data.set_index(data['Date']).sort_index()
In [18]:
data.columns
Out[18]:
Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')
In [19]:
data["Date"].min(), data["Date"].max() #see min and max means
Out[19]:
(Timestamp('2000-01-03 00:00:00'), Timestamp('2023-02-09 00:00:00'))
In [20]:
data.plot(x="Date", y="Open", figsize=(8,5))
Out[20]:
<AxesSubplot: xlabel='Date'>
In [21]:
data.plot(x="Date", y="Close", figsize=(8,5))
Out[21]:
<AxesSubplot: xlabel='Date'>
In [22]:
data.plot(x="Date", y="High", figsize=(8,5))
Out[22]:
<AxesSubplot: xlabel='Date'>
In [23]:
data.plot(x="Date", y="Low", figsize=(8,5))
Out[23]:
<AxesSubplot: xlabel='Date'>
In [24]:
sns.kdeplot(data['Close'], fill=True)
Out[24]:
<AxesSubplot: xlabel='Close', ylabel='Density'>
In [25]:
data[["Open", "High","Low","Close"]].corr()
Out[25]:
Open High Low Close
Open 1.000000 0.999924 0.999907 0.999802
High 0.999924 1.000000 0.999892 0.999908
Low 0.999907 0.999892 1.000000 0.999910
Close 0.999802 0.999908 0.999910 1.000000

Meaning have strong correlation among the columns, so we can use only one meaning for predict or use MGK.

In [26]:
# As mentioned earlier "When researching historical stock price data,use the closing price as the standard measure of the stock’s value"
# so let's try visualising the close price of the dataset using plotly

fig = px.line(data,x="Date",y="Close",title="Closing Price: Range Slider and Selectors")
fig.update_xaxes(rangeslider_visible=True,rangeselector=dict(
    buttons=list([
        dict(count=1,label="1m",step="month",stepmode="backward"),
        dict(count=6,label="6m",step="month",stepmode="backward"),
        dict(count=1,label="YTD",step="year",stepmode="todate"),
        dict(count=1,label="1y",step="year",stepmode="backward"),
        dict(step="all")
])))

Time Series Decomposition¶

  • We can decompose a time series into trend, seasonal amd remainder components, as mentioned in the earlier section.
  • The series can be decomposed as an additive or multiplicative combination of the base level, trend, seasonal index and the residual.
  • The seasonal_decompose in statsmodels is used to implements the decomposition.

hypothesis 1 - we have daily season and trend¶

In [27]:
series = data['Close']
result = seasonal_decompose(series, model='additive',period=1) # The frequncy is daily
figure = result.plot()

We have trend, but don't have dailyseasonal

hypothesis 2 - we have year season and trend¶¶

In [28]:
series = data['Close']
result = seasonal_decompose(series, model='additive', period=365) # The frequncy is daily
figure = result.plot()

We have trend and year seasonal/ And we see, that we have noises.

4 Modeling LSTM¶

In [29]:
data_10y = data[data["Date"] > data["Date"].max() - timedelta(days=365*10)] #cut dataset, that old meaning don't influence on predict
In [30]:
data_10y["Date"].min(), data_10y["Date"].max()
Out[30]:
(Timestamp('2013-02-12 00:00:00'), Timestamp('2023-02-09 00:00:00'))
In [31]:
data_10y.shape
Out[31]:
(2517, 7)
In [32]:
sns.kdeplot(data_10y['Close'], fill=True)
Out[32]:
<AxesSubplot: xlabel='Close', ylabel='Density'>

Baseline model¶

In [33]:
train_size = int(data_10y.shape[0]*0.8) #count meaning in train dataset
train_data = data_10y[:train_size]
validate_data = data_10y[train_size:]
In [34]:
data_10y.shape, train_data.shape, validate_data.shape
Out[34]:
((2517, 7), (2013, 7), (504, 7))
In [35]:
train_data
Out[35]:
Date Open High Low Close Adj Close Volume
Date
2013-02-12 2013-02-12 17.125357 17.227858 16.705000 16.710714 14.432729 609053200
2013-02-13 2013-02-13 16.686071 16.915714 16.543571 16.678928 14.405277 475207600
2013-02-14 2013-02-14 16.590000 16.844286 16.572144 16.663929 14.392324 355275200
2013-02-15 2013-02-15 16.744642 16.791430 16.425714 16.434286 14.193984 391745200
2013-02-19 2013-02-19 16.467857 16.526072 16.208929 16.428213 14.188734 435783600
... ... ... ... ... ... ... ...
2021-02-03 2021-02-03 135.759995 135.770004 133.610001 133.940002 132.149460 89880900
2021-02-04 2021-02-04 136.300003 137.399994 134.589996 137.389999 135.553329 84183100
2021-02-05 2021-02-05 137.350006 137.419998 135.860001 136.759995 135.133377 75693800
2021-02-08 2021-02-08 136.029999 136.960007 134.919998 136.910004 135.281570 71297200
2021-02-09 2021-02-09 136.619995 137.880005 135.850006 136.009995 134.392303 76774200

2013 rows × 7 columns

In [36]:
train_data["Date"].min(), train_data["Date"].max()
Out[36]:
(Timestamp('2013-02-12 00:00:00'), Timestamp('2021-02-09 00:00:00'))
In [37]:
validate_data["Date"].min(), validate_data["Date"].max()
Out[37]:
(Timestamp('2021-02-10 00:00:00'), Timestamp('2023-02-09 00:00:00'))
In [38]:
train_data[["Low"]]
Out[38]:
Low
Date
2013-02-12 16.705000
2013-02-13 16.543571
2013-02-14 16.572144
2013-02-15 16.425714
2013-02-19 16.208929
... ...
2021-02-03 133.610001
2021-02-04 134.589996
2021-02-05 135.860001
2021-02-08 134.919998
2021-02-09 135.850006

2013 rows × 1 columns

In [39]:
scaler = StandardScaler() #z = (x - u) / s, u - mean, s - standard deviation, x - each x
scaler.fit(train_data[["Close"]]) #normilize train data

def make_dataset(
    df,                 #данные для создания датасета
    window_size,        #кол. элементов для предсказания след. элемента
    batch_size,         #кол. элементов в batch для обучения
    use_scaler=True,    #использовать ли scaler для нормализации свечей
    shuffle=True        #смешивать элементы в датасете или нет
):
  features = df[["Close"]][:-window_size]      #-N предыдущих элементов как фичи, поэтому вычитаем
  if use_scaler:                                  #нормализуем фичи, если нужно
    features=scaler.transform(features)
  data = np.array(features, dtype=np.float32)     #приводим данные к нужному типу
  ds = tf.keras.preprocessing.timeseries_dataset_from_array(    #создание датасета с временным рядом для обучения НС
      data=data,                                  #принимает многомерный массив, кот. содержит фичи
      targets=df["Close"][window_size:],       #принимает лейблы, которые нужно сдвинуть на +N элементов
                                                  #т.е. для каждых N элементов из фичей будем иметь N+1 элемент как лейбл
      sequence_length=window_size,                #принимает длину нашей последовательности window_size
      sequence_stride=1,                          #на сколько нужно сдвигать элемент для создания нового семпла, т.е. предсказываем каждый след. элемент
      shuffle=shuffle,                            #нужно ли перемешивать элементы
      batch_size=batch_size                       #какой размер батча
  )
  return ds  
In [40]:
example_ds = make_dataset(df=train_data, window_size=3, batch_size=2, use_scaler=False, shuffle=False)
In [41]:
example_feature, example_label = next(example_ds.as_numpy_iterator())
In [42]:
example_feature.shape
Out[42]:
(2, 3, 1)
In [43]:
example_label.shape
Out[43]:
(2,)
In [44]:
train_data["Close"][:6]
Out[44]:
Date
2013-02-12    16.710714
2013-02-13    16.678928
2013-02-14    16.663929
2013-02-15    16.434286
2013-02-19    16.428213
2013-02-20    16.030357
Name: Close, dtype: float64
In [45]:
print(example_feature[0])
print(example_label[0])
[[16.710714]
 [16.678928]
 [16.663929]]
16.43428611755371
In [46]:
window_size=10
batch_size=8
train_ds = make_dataset(df=train_data, window_size=window_size, batch_size=batch_size, use_scaler=True, shuffle=True)
val_ds = make_dataset(df=validate_data, window_size=window_size, batch_size=batch_size, use_scaler=True, shuffle=True)
In [47]:
train_ds
Out[47]:
<BatchDataset element_spec=(TensorSpec(shape=(None, None, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.float64, name=None))>
In [48]:
val_ds
Out[48]:
<BatchDataset element_spec=(TensorSpec(shape=(None, None, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.float64, name=None))>
In [66]:
lstm_model = tf.keras.models.Sequential([
  tf.keras.layers.LSTM(32, return_sequences=False),
  tf.keras.layers.Dense(1)
])
    
In [61]:
def compile_and_fit(model, train_ds, val_ds, num_epochs: int = 20): #принимает модель, тренировочный датасет, валидационный датасет и кол. эпох для тренировки
  model.compile(                              #сперва модель компилируется
      loss=tf.losses.MeanSquaredError(),      #передаем функцию потерь, исп. для обучения - Это регресия - предск. вещ. числа - для этой задачи исп. MeanSquaredError
      optimizer=tf.optimizers.Adam(),         #передаем функцию оптимизации, кот. используется для оптимизации функции потерь
      metrics=[tf.metrics.MeanAbsoluteError(), tf.keras.metrics.MeanAbsolutePercentageError()

]    #передается массив метрик, для просмотра как происходит обучение, как наши метрики меняются - растут/уменьшаются или остаются на месте
  )
  history = model.fit(                        #здесь происходит обучение
      train_ds,                               #передаем тренировочный датасет, на котором будет происходить обучение
      epochs=num_epochs,                      #кол. эпох - сколько раз полностью пройдемся по всему тренировочному датасету - сколько раз мы используем все семплы из тренировочного датасета для обучения
      validation_data=val_ds,                 #передаем те данные, по которым можно расчитывать валидационные метрики - на которых можно понять наша модель переобучается или нет
      verbose=0                               #выводить ли промежуточную информацию во время обучения
  )
  return history                              #объект с помощью которого можно напечатать значения метрик во время эпох
In [62]:
start_time = time.time()
lstm_model = tf.keras.models.Sequential([  #группировка слоев
  tf.keras.layers.LSTM(32, return_sequences=False),
  tf.keras.layers.Dense(1)
])
history = compile_and_fit(lstm_model, train_ds, val_ds, num_epochs=100)
print("--- %s seconds ---" % (time.time() - start_time))
--- 59.351494789123535 seconds ---
In [63]:
lstm_model.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_2 (LSTM)               (None, 32)                4352      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
In [64]:
plt.plot(history.history['mean_absolute_error'])
Out[64]:
[<matplotlib.lines.Line2D at 0x253f0e553a0>]
In [65]:
plt.plot(history.history['val_mean_absolute_error'])
Out[65]:
[<matplotlib.lines.Line2D at 0x253fd4ea400>]
In [66]:
lstm_model.evaluate(train_ds)
250/250 [==============================] - 1s 2ms/step - loss: 1.3011 - mean_absolute_error: 0.5997 - mean_absolute_percentage_error: 1.2651
Out[66]:
[1.3010823726654053, 0.5997085571289062, 1.265061616897583]
In [67]:
lstm_model.evaluate(val_ds)
61/61 [==============================] - 0s 735us/step - loss: 192.9559 - mean_absolute_error: 9.7338 - mean_absolute_percentage_error: 6.0281
Out[67]:
[192.95591735839844, 9.73377513885498, 6.028115749359131]

Tuning¶

In [68]:
start_time = time.time()
lstm_model = tf.keras.models.Sequential([
  tf.keras.layers.LSTM(32, return_sequences=False),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(1)
])
history = compile_and_fit(lstm_model, train_ds, val_ds, num_epochs=300)
print("--- %s seconds ---" % (time.time() - start_time))
--- 180.8377878665924 seconds ---
In [58]:
lstm_model.evaluate(train_ds)
250/250 [==============================] - 1s 2ms/step - loss: 1.6492 - mean_absolute_error: 0.8115 - mean_absolute_percentage_error: 1.9823 - mean_squared_error: 1.6492
Out[58]:
[1.6491988897323608,
 0.8114990592002869,
 1.9822752475738525,
 1.6491988897323608]
In [69]:
lstm_model.evaluate(val_ds)
61/61 [==============================] - 1s 7ms/step - loss: 122.2312 - mean_absolute_error: 8.7047 - mean_absolute_percentage_error: 5.4921
Out[69]:
[122.23119354248047, 8.704742431640625, 5.49210262298584]
In [70]:
plt.plot(history.history['mean_absolute_error'])
Out[70]:
[<matplotlib.lines.Line2D at 0x253f3de82b0>]
In [71]:
plt.plot(history.history['val_mean_absolute_error'])
Out[71]:
[<matplotlib.lines.Line2D at 0x253f3e12c10>]

LSTM 2¶

In [72]:
import plotly.graph_objects as go
In [73]:
from sklearn.preprocessing import MinMaxScaler
In [74]:
from tensorflow import keras
from tensorflow.keras.layers import Dense,LSTM,Dropout,Flatten
In [82]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
In [75]:
train = train_data['Close'].values #Return a Numpy representation of the DataFrame.
test = validate_data['Close'].values
In [76]:
train
Out[76]:
array([ 16.71071434,  16.67892838,  16.66392899, ..., 136.75999451,
       136.91000366, 136.00999451])
In [77]:
training_values = np.reshape(train,(len(train),1)) #преобразуем массив в массив с кол-вом элементов равному длине выборки
scaler = MinMaxScaler() #Normilize from 0 to 1
training_values = scaler.fit_transform(training_values)
# assign training values
x_train = training_values[0:len(training_values)-1]
y_train = training_values[1:len(training_values)]
x_train = np.reshape(x_train,(len(x_train),1,1))
In [78]:
x_train[:5]
Out[78]:
array([[[0.02138504]],

       [[0.02113904]],

       [[0.02102296]],

       [[0.01924571]],

       [[0.01919871]]])
In [79]:
# creates model
model = Sequential()
model.add(LSTM(128,return_sequences=True,input_shape=(None,1)))
model.add(LSTM(64,return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))

#compile the model
model.compile(optimizer='adam',loss='mean_squared_error')

# Train the model
model.fit(x_train,y_train,epochs=25,batch_size=8)
Epoch 1/25
252/252 [==============================] - 2s 2ms/step - loss: 0.0056
Epoch 2/25
252/252 [==============================] - 1s 2ms/step - loss: 1.2676e-04
Epoch 3/25
252/252 [==============================] - 0s 2ms/step - loss: 1.0512e-04
Epoch 4/25
252/252 [==============================] - 0s 2ms/step - loss: 1.3132e-04
Epoch 5/25
252/252 [==============================] - 0s 2ms/step - loss: 1.0448e-04
Epoch 6/25
252/252 [==============================] - 0s 2ms/step - loss: 1.2065e-04
Epoch 7/25
252/252 [==============================] - 0s 2ms/step - loss: 1.1018e-04
Epoch 8/25
252/252 [==============================] - 0s 2ms/step - loss: 1.4161e-04
Epoch 9/25
252/252 [==============================] - 0s 2ms/step - loss: 1.4750e-04
Epoch 10/25
252/252 [==============================] - 0s 2ms/step - loss: 1.3625e-04
Epoch 11/25
252/252 [==============================] - 0s 2ms/step - loss: 1.3955e-04
Epoch 12/25
252/252 [==============================] - 0s 2ms/step - loss: 1.4683e-04
Epoch 13/25
252/252 [==============================] - 0s 2ms/step - loss: 1.6056e-04
Epoch 14/25
252/252 [==============================] - 0s 2ms/step - loss: 1.4393e-04
Epoch 15/25
252/252 [==============================] - 1s 2ms/step - loss: 1.2705e-04
Epoch 16/25
252/252 [==============================] - 0s 2ms/step - loss: 1.3358e-04
Epoch 17/25
252/252 [==============================] - 0s 2ms/step - loss: 1.3983e-04
Epoch 18/25
252/252 [==============================] - 0s 2ms/step - loss: 1.7239e-04
Epoch 19/25
252/252 [==============================] - 0s 2ms/step - loss: 1.5268e-04
Epoch 20/25
252/252 [==============================] - 0s 2ms/step - loss: 1.2776e-04
Epoch 21/25
252/252 [==============================] - 0s 2ms/step - loss: 1.2659e-04
Epoch 22/25
252/252 [==============================] - 0s 2ms/step - loss: 1.6346e-04
Epoch 23/25
252/252 [==============================] - 0s 2ms/step - loss: 1.3901e-04
Epoch 24/25
252/252 [==============================] - 0s 2ms/step - loss: 1.1434e-04
Epoch 25/25
252/252 [==============================] - 1s 2ms/step - loss: 1.2799e-04
Out[79]:
<keras.callbacks.History at 0x253eb541430>
In [102]:
# assign test and predicted values + reshaping + converting back from scaler
test_values = np.reshape(test, (len(test), 1))
test_values = scaler.transform(test_values)
test_values = np.reshape(test_values, (len(test_values), 1, 1))
predicted_price = model.predict(test_values)
predicted_price = scaler.inverse_transform(predicted_price)
predicted_price=np.squeeze(predicted_price)
16/16 [==============================] - 0s 2ms/step
In [103]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=validate_data['Date'],y=validate_data['Close'],name='Close'))
fig.add_trace(go.Scatter(x=validate_data['Date'],y=predicted_price,name='Forecast_LSTM'))
fig.show()
In [104]:
# evaluate forecasts
mse_lstm = mean_squared_error(test, predicted_price)
print('Test MSE: %.3f' % mse_lstm)
mae_lstm = mean_absolute_error(test, predicted_price)
print('Test MAE: %.3f' % mae_lstm)
Test MSE: 0.291
Test MAE: 0.283
In [105]:
mape_lstm = mean_absolute_percentage_error(test, predicted_price)
print('Test MAPE: %.3f' % mape_lstm)
Test MAPE: 0.002
In [106]:
rmse_lstm = math.sqrt(mean_squared_error(test, predicted_price))
print('Test RMSE: %.3f' % rmse_lstm)
Test RMSE: 0.540
In [107]:
table_predicted = pd.DataFrame(validate_data['Close']) #create DataFrame with predict date
table_predicted['Predicted_price'] = predicted_price
table_predicted
Out[107]:
Close Predicted_price
Date
2021-02-10 135.389999 135.323959
2021-02-11 135.130005 135.060410
2021-02-12 135.369995 135.303680
2021-02-16 133.190002 133.093719
2021-02-17 130.839996 130.712158
... ... ...
2023-02-03 154.500000 154.444962
2023-02-06 151.729996 151.724976
2023-02-07 154.649994 154.591522
2023-02-08 151.919998 151.912323
2023-02-09 150.869995 150.875641

504 rows × 2 columns

In [114]:
Meta = yf.download("META", start="2001-01-01", end="2023-02-08")
[*********************100%***********************]  1 of 1 completed
In [115]:
new_df = Meta.filter(['Close'])
In [116]:
new_df
Out[116]:
Close
Date
2012-05-18 38.230000
2012-05-21 34.029999
2012-05-22 31.000000
2012-05-23 32.000000
2012-05-24 33.029999
... ...
2023-02-01 153.119995
2023-02-02 188.770004
2023-02-03 186.529999
2023-02-06 186.059998
2023-02-07 191.619995

2698 rows × 1 columns

In [117]:
last_60_days = new_df[-60:].values
X_test = scaler.fit_transform(last_60_days)
In [118]:
X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1], 1))
pred_price = model2.predict(X_test)
pred_price = scaler.inverse_transform(pred_price)
print(pred_price)
2/2 [==============================] - 0s 2ms/step
[[112.409615]
 [113.52345 ]
 [114.686676]
 [117.46263 ]
 [113.72695 ]
 [112.00305 ]
 [112.58391 ]
 [110.46506 ]
 [111.99338 ]
 [112.76788 ]
 [111.96434 ]
 [109.42144 ]
 [110.07842 ]
 [118.45381 ]
 [120.72981 ]
 [123.70034 ]
 [122.667496]
 [114.5897  ]
 [114.40547 ]
 [115.763466]
 [116.31671 ]
 [115.16191 ]
 [120.44759 ]
 [121.84936 ]
 [116.559425]
 [119.74709 ]
 [114.93883 ]
 [117.47235 ]
 [120.06813 ]
 [117.50149 ]
 [118.3955  ]
 [117.26834 ]
 [116.04492 ]
 [120.55464 ]
 [120.63249 ]
 [124.91892 ]
 [127.48468 ]
 [127.065025]
 [130.07214 ]
 [129.53496 ]
 [132.97409 ]
 [132.87634 ]
 [136.61089 ]
 [136.8749  ]
 [135.29088 ]
 [133.0034  ]
 [136.0633  ]
 [139.2121  ]
 [143.02588 ]
 [142.89877 ]
 [141.29509 ]
 [146.9653  ]
 [151.30176 ]
 [146.73076 ]
 [148.59692 ]
 [152.64844 ]
 [187.02283 ]
 [184.89705 ]
 [184.45024 ]
 [189.71873 ]]
In [119]:
Meta = yf.download("META", start="2023-02-07", end="2023-02-09") 
print(Meta['Close'])
[*********************100%***********************]  1 of 1 completed
Date
2023-02-07    191.619995
2023-02-08    183.429993
Name: Close, dtype: float64
In [ ]:
aapl = yf.Ticker("AAPL")

# get stock info
aapl.info

# get historical market data
hist = aapl.history(period="max")

# show actions (dividends, splits)
aapl.actions

# show dividends
aapl.dividends

# show splits
aapl.splits

# show financials
aapl.financials
aapl.quarterly_financials

# show major holders
aapl.major_holders

# show institutional holders
aapl.institutional_holders

# show balance sheet
aapl.balance_sheet
aapl.quarterly_balance_sheet

# show cashflow
aapl.cashflow
aapl.quarterly_cashflow

# show earnings
aapl.earnings
aapl.quarterly_earnings

# show sustainability
aapl.sustainability

# show analysts recommendations
aapl.recommendations

# show next event (earnings, etc)
aapl.calendar

# show ISIN code - *experimental*
# ISIN = International Securities Identification Number
aapl.isin

# show options expirations
aapl.options

# get option chain for specific expiration
#opt = aapl.option_chain('YYYY-MM-DD')
# data available via: opt.calls, opt.puts
     
In [128]:
#https://habr.com/ru/post/485890/
#https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21